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In Malaysia, rice, ranked as the third most crucial crop, faces challenges due to 
domestic consumption outpacing production, resulting in increased instances 
of rice adulteration. This underscores the imperative of maintaining integrity 
and quality standards across the entire supply chain. This study uses an 
electronic nose, comprising four metal oxide semiconductor (MOS) gas sensors, 
and employing temperature modulation, Principal Component Analysis (PCA) 
and supervised machine learning (classification models) to distinguish rice 
varieties such as Bario, Bajong, Borneo Fragrant, Biris, and Jasmine. The study 
evaluated 30 classifiers based on their classification and validation accuracy. 
Sensor data was first extracted from the transient response of sensors output 
voltage, yielding a 12-dimension dataset with response times of 30 s, 50 s, and 
95 s. Classification models trained from this dataset achieved classification 
(training) accuracy of up to 100% and validation accuracy of up to 96%, where 
the best performing models are subspace discriminant and kernel naive bayes 
classifiers. An attempt was also made to analyze the sensor data frequency 
response for rice classification. Comparison between the prediction results in 
the transient and frequency domains showed that transient response is better 
suited for the classification of rice. 


1. Introduction 


this study. These included neural networks, naive Bayes, 
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The increasing population and natural factors affecting 
rice production, coupled with the diverse range of rice 
varieties, have created opportunities for dishonest traders to 
profit by adulterating rice. Adulteration methods include 
blending lower-quality rice, adding similar-looking materials, 
and withholding clear information about the rice's origin and 
age. Even small amounts of undesirable substances can make 
it challenging to differentiate between genuine and fake rice. 
Such adulterated products pose serious health risks and can 
lead to harmful consequences. Recent reports have even 
surfaced about the use of plastic rice as an adulterant, 
underscoring the dangers of food adulteration [1-5]. This 
report suggests a swifter, cost-effective, and non-intrusive 
approach for geographically tracking the classification of 
Sarawak Premium Rice (Bajong, Bario, Biris, and Borneo 
Fragrant Rice) by the model, which is metal oxide 
semiconductor (MOS) Gas Sensors or electronic nose, which 
is more cost-effective, rapid, and non-invasive. To identify the 
most dependable classification models for this application, a 
variety of popular classifiers were trained and evaluated in 
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linear regression, logistic regression, random forests, support 
vector machines (SVM) [6]. Classification accuracy was 
compared across these models to determine their efficacy for 
the task at hand. In a frequency domain representation, the 
emphasis is on illustrating the relationship between the 
signal's amplitude (or power) and frequency, as opposed to 
time. This approach proves especially beneficial for signals 
characterized by the superposition of multiple frequencies 
[7]. Transient and frequency response methods were devised 
to authenticate rice samples from Bario, Bajong, Borneo 
Fragrant, Biris, and Jasmine varieties. In the transient 
response approach, classification model training and 
prediction were iterated using sensor responses at various 
points in the sampling cycle. Conversely, in frequency 
response analysis, dip frequencies were recorded, and 
patterns were compared for identification. These methods 
facilitated the precise classification of rice varieties based on 
their distinct response characteristics. 
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2. Methodology 
2.1 Preparation, storage, and sampling of rice 

To create the training datasets utilized in this study to 
train the classification models, a total of 100 rice samples 
were meticulously prepared as follows: 

e 20 samples of Bario 

e 20 samples of Bajong 

e 20 samples of Borneo Fragrant 
e 20 samples of Biris 

e 20 samples of Jasmine 

To validate the trained classification models, a total of 25 
rice samples were prepared as follows and predicted by the 
trained classification models: 

e 5 samples of Bario 
e 5 samples of Bajong 
e 5 samples of Borneo Fragrant 
e 5 samples of Biris 
5 samples of Jasmine 

Figure 1 illustrates the physical appearance of each type 
of rice samples used in this study. Each sample consisted of 8 
grams of rice. These samples encompassed Sarawak Premium 
Rice varieties, including Bario, Bajong, Borneo Fragrant, Biris, 
and a standard rice type, Jasmine. The rice sample was stored 
in a zipper bag and labeled clearly with the name of the rice 
and the number of the rice. 

This study utilized the MOS gas sensors module, 
sampling process, and feature extraction method practiced by 
Lee et al. in their studies [8-11]. Sensor response is defined as 
the change in sensor output voltage (voltage across an 
external load resistor) due to the change in resistance of the 
sensing material in the sensor. The change in sensor output 
voltage is calculated as a percentage change by comparing it 
with the sensor output voltage baseline, which was set to 1.0 
V in this study. 

For the construction of the MOS gas sensor array, four 
Figaro TGS series sensors were employed: TGS2600, 
TGS2602, TGS2620, and TGS2611. These selected MOS gas 
sensors possessed unique selectivity for target gases, as 
outlined in Table 1 [12-15]. 
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The complete circuit configuration is shown in Figure 2. 
The microcontroller chosen for this purpose is the Arduino 
Uno, which facilitates the collection of sensory data during the 
sampling process and allows the data to be displayed on the 
Arduino software graphical user interface. Figure 3 illustrates 
the sample holder used to contain rice samples for headspace 
sampling. 


Table 1. Sensor models used in this study 


Sensor Target Gases Sensor Resistance 
Model 
TGS2600 Air contaminants 10 to 90 kQ in air 
(hydrogen, ethanol, etc.) 
TGS2602 Air contaminants (VOCs, 10 to 100 KQ in air 
ammonia, H2S, etc.) 
TGS2620 Alcohol, solvent vapours | 0.68 to 6.8 kQ in 5000 
ppm methane 
TGS2611 Methane, natural gas 1to 5 KQ in 300 ppm 
ethanol 


Before sampling, the sensors’ output voltages were 
calibrated to a baseline of 1.0 V by adjusting the resistance of 
the digital potentiometers (connected as external load 
resistors). During the sampling process, the sample holder 
containing rice sample was inserted into the sensing 
chamber. The sensing chamber was isolated from other 
pneumatic components using normally closed vacuum 
solenoid valves to prevent air or odor exchange. Upon 
completion of sampling, the solenoid valves were opened to 
introduce carrier gas. The air pump was then activated to 
purge volatile organic compounds (VOCs) from the sample 
and replace it with clean air at a flow rate of 2000 cm*/min. 
This ensured the sensors returned to their baseline output 
voltage (1.0 V) in a chamber filled with clean air. Before 
initiating the rice sampling process, sensor warming was 
performed to enhance sensor stability. This involved placing 
a sample of 6 coffee beans into the sensing chamber and 
applying a heater voltage level of 5.0 V for 20 minutes. A 
cumulative quantity of 8 grams would be collected as sample 
data for Principal Component Analysis (PCA). 


(b) 


Figure 1. Sample of (a) Bajong, (b) Jasmine, (c) Borneo Fragrant, (d) Bario and (e) Biris 


(d) 
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Temperature modulation of MOS gas sensors was 
employed by applying three heater voltages (4.6 V, 4.8 V, and 
5.0 V) to produce three configurations of gas sensors 
sensitivity and selectivity. Data collection began at the 10th 
second, with the sensitivity level incrementing by 0.2 V every 
30 seconds until reaching 5.0 V, then remaining constant for 
the subsequent 30 seconds. Following this, it decreased by 0.2 
V every 30 seconds until reaching the threshold of 4.6 V. In 
the detailed sampling procedure shown in Figure 4, the total 
sampling time of 140 seconds (including a 10-second baseline 
period), a maximum purging duration of 20 seconds, and a 
maximum recovery phase of 200 seconds reveal that each 
sampling session required a maximum of approximately 360 
seconds, equivalent to 6 minutes. This was notably faster 
compared to the 30-minute sampling cycle typically 
associated with the conventional GC-MS method. 


Sensing Chamber 


with four MOS gas 


j Air Filter with Active A 


Carbon 


Pneumatic 
System Relay 
Controller 


Solenoid 
— Valve 


Purging Air Flow Direction 


Microcontroller 
with digital 


potentiometers for 
baseline calibration 


Connected 
to Laptop 


Figure 2. Complete circuit configuration of the MOS gas sensor array 


L 


Figure 3. Sample holder 


2.2 Sample data analysis 

A total of 100 training samples (20 samples per class) 
and 25 validation samples (5 samples per class) were 
collected for the principal component analysis (PCA), which 
was initially applied to the data matrix as a preprocessing 
step. Subsequently, the preprocessed data matrix was utilized 
for training 33 classification models from broad categories of 
the decision tree, discriminant analysis, logistic regression 
classifier, naive Bayes classifier, SVM, k-nearest neighbor 
(KNN), ensemble classifier, and neural network. All 33 
classification models underwent evaluation based on their 
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classification accuracy on the training datasets and prediction 
accuracy on five test samples. The selected model with the 
highest accuracy was chosen, and if necessary, fine-tuning 
would be conducted. 


2.3 Signal processing on the frequency domain 

Figure 5 outlines the workflow for signal processing on 
the frequency domain. MATLAB was utilized for the signal 
processing method. In the program, the data obtained from 
the sensor response curve would undergo transformation 
into a frequency domain graph for signal processing. It was 
utilized to discern the signal behavior of each rice sample by 
identifying dips in the signal and recording their frequencies. 


f=} (1) 


The sample rate was determined to be 1 Hz, as the overall 
sampling process takes approximately 130 seconds with a 1- 
second interval between each sample. This calculation was 
derived using the equation (1). This process was conducted 
for every variety of rice, and ultimately, the range of 
frequencies associated with the dips was compared to 
identify patterns. Dips are the main features extracted from a 
frequency domain graph. Dips in a frequency domain graph 
represent specific frequencies at which the signal's amplitude 
or power decreases significantly compared to neighboring 
frequencies. These dips often correspond to certain features 
or characteristics present in the signal. They serve as an 
important marker that can reveal valuable information about 
the underlying signals and help in interpreting and analyzing 
the data effectively. Based on the example in Figure 6, three 
dips are observed between 320.1 mHz and 386.3 mHz. 


3. Results and discussion 
3.1 Transient response 

Table 2 shows the legend of the rice samples data on the 
principal components (PC) scatter plot generated using the 
processed data of the sensors’ transient responses. Data was 
collected in two dimensions - a 4-dimensional dataset (from 
the response time of 80 s) anda 12-dimensional dataset (from 
the response times of 30, 50, and 95 seconds) - for 
comparison. Classification accuracy was assessed using a 
confusion matrix, employing a Cubic SVM model, as shown in 
Table 3. PC3 vs PC1 graph is shown in Table 3 solely for 
illustration purposes to demonstrate good interclass data 
separation that can even be observed by human eyes, which 
verifies the high accuracy in classification model training. 


Table 2. Legend pf transient response scatter plot 


Rice Abbreviation Color of data point on the 
scatter plot 

Bario BR Yellow 

Bajong BJ Orange 

Borneo BH Blue 

Fragrant 

Biris BS Purple 

Jasmine Js Green 
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Increase the sensitivity 
level for every 30s until 
sensitivity level 3 


YES 


Maintain the sensitivity 
level 3 for 30s 


Figure 4. Sampling procedure 


Create a csv file for 


collected data 
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and power spectrum from the 
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Figure 5. Procedures for frequency response signal processing 
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Turn on the calibration 
on the microcontroller 


Baseline 
Correction 
Achieved? 


Extract the sample out of 
the sensing chamber 


Switch on the air pump to 
purge the sensing chamber 


Run the Open Signal 
program Analyzer 


Drag the signal from 
the Workspace 
Browser to the Axes 
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Figure 6. Example of dips: three dips are observed between 320.1 mHz and 386.3 mHz 


Table 3. Classification model training accuracy: comparison between 4-dimensional and 12-dimensional dataset 
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Table 4. Classification model training accuracy: comparison between different sets of response times 
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In the 4-dimensional dataset, the confusion matrix 
indicated a 99% accuracy rate, with one misclassification 
observed: a Borneo Fragrant rice sample incorrectly 
predicted as Biris rice. Conversely, the 12-dimensional 
confusion matrix demonstrated flawless 100% accuracy, with 
all rice samples correctly classified. Consequently, a 12- 
dimensional PCA scatter plot, considering the utilization of 
four MOS gas sensors, for comprehensive analysis. 

Three 12-dimensional datasets were generated, each 
with different response times: Set 1 (30, 60, 95 seconds), Set 
2 (30,50, 90 seconds), and Set 3 (30, 50, 95 seconds). Utilizing 
a Coarse Three classification model, a confusion matrix was 
employed for comparison which shown in Table 4. While Set 
3 allowed the classification of the rice samples with some 
errors, it was selected for further analysis. 

Table 5 displays the best-performing classification 
models achieving a classification (training) accuracy of 100%. 
Acomprehensive list of 14 classification models was provided 
for reference. During the classification models’ validation 
stage, three distinct validation sets were selected as well as 
shown in Table 6, each corresponding to datasets generated 
at different response times: Set 1 (30, 60, 95 seconds with PC4 
and PC1), Set 2 (30, 50, 90 seconds with PC1 and PC3), and Set 
3 (30, 50, 95 seconds with PC3 and PC1). The Subspace 
Discriminant classification model was chosen for comparison. 
Set 3 stood out as the best-performing dataset, boasting an 
impressive accuracy of 96%, surpassing the others in 
accuracy among the compared sets. 


Table 5. Classification models that achieved 100% classification 
(training) accuracy 


Classification accuracy of 
training data 
(% out of 100 training 
Classification model pes) 
Set 3 
(30s, 50s, 95s) 

Quadratic Discriminant 100 
Quadratic SVM 100 
Cubic SVM 100 
Fine Gaussian SVM 100 
Medium Gaussian SVM 100 
Fine KNN 100 
Weighted KNN 100 
Bagged Trees 100 
Narrow Neural Network 100 
Medium Neural Network 100 
Kernel Naive Bayes 100 
Subspace KNN 100 
Wide Neural Network 100 
Bilayer Neural Network 100 
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Table 7 showcases the top 6 classification models 
achieving the highest validation accuracy. Subspace 
Discriminant and Kernel Naive Bayes both achieved 96% 
accuracy, while Coarse Gaussian SVM, Cosine KNN, Weighted 
KNN, and Bagged Trees attained an accuracy of 92%. 


3.2 Frequency response 

The exclusion method could be employed, which 
involves comparing frequency ranges and eliminating 
incorrect matches to identify the rice sample. Table 8 shows 
the comparison of the range of dips frequency across training 
and validation data for four sensors. It became apparent that 
identifying rice was challenging due to discrepancies in the 
frequency range between the training and validation sets. 
Some fell within the range, while others did not. Moreover, 
there were instances where the frequency ranges from a 
sensor for a specific rice sample overlapped with the range of 
another rice sample, suggesting that it did not exclusively 
pertain to one sample. Hence, it could be concluded that the 
rice sample cannot be identified solely through the 
comparison of the frequency range of dip occurrences. 

Another comparison was conducted on the range of dips 
power spectrum in Table 9 to determine if rice could be 
correctly identified. However, the outcome mirrored that of 
the frequency range comparison, wherein the power 
spectrum range of a sensor for a rice sample did not closely 
match the validation set, and some overlaps occurred with 
other samples. Consequently, this comparison method was 
deemed inappropriate for rice classification analysis. 

The signal processing method in the frequency domain 
may not be suitable for classification when dealing with a 
larger number of sample types, such as the five different types 
of rice in this case. With more types of samples, overlapped 
between frequency responses become more common, leading 
to difficulties in classification. Restricting the classification to 
fewer types, may yield more reliable results as it reduced the 
likelihood of overlaps and enhanced the distinctiveness of 
frequency responses for each type. 


Table 7. Classification models with the highest validation accuracy 


Prediction accuracy of 
validation data 
(% out of 25 validation 
Classification model 
samples) 
Set 3 


(30s, 50s, 95s) 


Subspace Discriminant 96 
Kernel Naive Bayes 96 
Coarse Gaussian SVM 92 
Cosine KNN 92 
Weighted KNN 92 
Bagged Trees 92 
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Table 6. Classification model validation accuracy: comparison between different sets of response times 
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Table 8. Range of sensors dips frequency across training and validation data: comparison between five classes of samples 
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Range of dips frequency (mHz) 
Bario Bajong Borneo Fragrant Biris Jasmine 
Training(T)/ 

Validation (V) bl de T y 1 y E y T y 
411.1 320.1 358.7 329.2 352.1 346.0 370.8 319.5 370.6 

TGS2600 - - - 354.3 - - - - - - 
483.4 422.2 460.1 402.9 417.5 487.3 480.1 469.1 445.9 
380.8 383.5 327.8 370.3 321:2 

TGS2602 419.4 - - - - 473.7 - 
450.9 417.8 484.6 440.9 469.6 
346.0 316.2 369.7 311.2 384.2 317.3 380.2 406.2 

TGS2620 - - 448.1 - - - - 314.5 - - 
377.5 477.4 418.9 469.9 483.4 419.6 465.2 429.9 
352.1- 352.6 351.0 460.3 355.9 358.1 421.6 317.3 426.5 

TGS2611 385 5 - - - - - - - 456.4 - 
‘ 424.4 469.1 483.4 482.7 485.6 485.6 486.5 490.1 


Table 9. Range of sensors dips power spectrum across training and validation data: comparison between five classes of samples 


Range of dips power spectrum (dB) 
Bario Bajong Borneo Fragrant Biris Jasmine 
Training(T)/ 
Validation (V) y E Y u Y Y y T y 
(-97.1) (-110.2) (-100) (-94.7) (-91.5) (-101.1) (-108.4) (-94.0) (-107.6) 
TGS2600 - - - -84.7 - - - - - - 
(-77.9) (-73.1) (-71.7) (-67.5) (-81.1) (-76.6) (-79.8) (-69.1) (-82.3) 
(-86.5) (-87.7) (-99.9) (-95.4) (-103.7) 
TGS2602 -81.3 - - - - -97.4 - 
(-81.3) (-82.7) (-79.2) (-88.0) (-82.7) 
(-89.0) (-89.6) (-91.5) (-84.6) (-87.2) (-103.2) (-84.5) (-94.7) 
TGS2620 - - -74.4 - - - - -79.3 - - 
(-83.9) (-76.9) (-74.6) (-67.2) (-75.3) (-74.4) (-75.4) (-85.7) 
(-96.9) (-77.7) (-84.5) (-89.3) a (-107.4) (-90.1) (-101.9) (-96.4) 
TGS2611 - - - - 4) - - - -98.5 - 
(-89.4) (-76.2) (-77.4) (-84.1) - (-78.9) (-87.7) (-84) (-89.5) 
(-72.7) 
4. Conclusion Ethical issue 


The machine learning method achieves an optimal 
accuracy of 100% by analyzing the transient response within 
the training dataset. Furthermore, when this method is 
applied to the validation dataset, it also yields notably high 
accuracy levels, > 80%. The high accuracy suggests that the 
machine learning model effectively identifies most rice 
samples by recognizing their distinctive characteristics. On 
the contrary, the outcome of the signal processing on the 
frequency response did not meet expectations. It is found that 
this signal processing method is unable to interpret the 
behaviors exhibited by the rice samples. Therefore, the 
analysis of frequency dips did not effectively aid in identifying 
the rice samples as anticipated. In short, a machine learning 
method with transient response is recommended for the 
classification of rice samples. 
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